Model editing

The goal of model editing is to enable the use of a single pair of input $x_e$ and desired output $y_e$ to alter a base model’s output for $x_e$ as well as its equivalence neighborhood (related input/output pairs), all while leaving model behavior on unrelated inputs unchanged. Mitchell2022fast

Does model editing ‘update’ the “knowledge” that the model has? When a human change their belief or knowledge, that can triggers a cascade of updates. Does this happen in LLMs?

Why doesn’t fine-tuning work? “Fine-tuning on a single example tends to overfit.” (Mitchell2022fast; see Zhu et al 2020 and De Cao 2021).

An early study: Sinitsin et al. 2020. not so efficient.

De Cao 2021: more efficient but fail in practice.

Mitchell2022fast proposes MEND (Model Editor Networks with Gradient Decomposition).

Hartvigsen2022aging